Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels

نویسنده

  • David Jurgens
چکیده

Word sense disambiguation aims to identify which meaning of a word is present in a given usage. Gathering word sense annotations is a laborious and difficult task. Several methods have been proposed to gather sense annotations using large numbers of untrained annotators, with mixed results. We propose three new annotation methodologies for gathering word senses where untrained annotators are allowed to use multiple labels and weight the senses. Our findings show that given the appropriate annotation task, untrained workers can obtain at least as high agreement as annotators in a controlled setting, and in aggregate generate equally as good of a sense labeling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Benefits of a Model of Annotation

This paper presents a case study of a difficult and important categorical annotation task (word sense) to demonstrate a probabilistic annotation model applied to crowdsourced data. It is argued that standard (chance-adjusted) agreement levels are neither necessary nor sufficient to ensure high quality gold standard labels. Compared to conventional agreement measures, application of an annotatio...

متن کامل

Crowdsourcing Experiment Designs for Chinese Word Sense Annotation

This paper tries to demonstrate our exploratory efforts in tackling with the “high accuracy-low quantity” problem of human word sense annotation task in Chinese, and ultimately reach the goal of automatic word sense annotation. Our proposed annotation architecture consists of explicit and implicit aspects of of crowdsourcing approach. Explicit method focuses on the general issues of crowdsourci...

متن کامل

Crowdsourcing Word Sense Definition

In this paper, we propose a crowdsourcing methodology for a single-step construction of both an empirically-derived sense inventory and the corresponding sense-annotated corpus. The methodology taps the intuitions of non-expert native speakers to create an expertquality resource, and natively lends itself to supplementing such a resource with additional information about the structure and relia...

متن کامل

When Is Word Sense Disambiguation Difficult? A Crowdsourcing Approach

We identified features that drive differential accuracy in word sense disambiguation (WSD) by building regression models using 10,000 coarse-grained WSD instances which were labeled on Mturk. Features predictive of accuracy include properties of the target word (word frequency, part of speech, and number of possible senses), the example context (length), and the Turker’s engagement with our tas...

متن کامل

On the difficulty of making concreteness concrete

The use of labels of semantic properties like ‘concreteness’ is quite common in studies in syntax, but their exact meaning is often unclear. In this article, we compare different definitions of concreteness, and use them in different implementations to annotate nouns in two data sets: (1) all nouns with word sense annotations in the SemCor corpus, and (2) nouns in a particular lexico-syntactic ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013